Goto

Collaborating Authors

 nigerian prince


Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

Wang, Jiongxiao, Li, Jiazhao, Li, Yiquan, Qi, Xiangyu, Hu, Junjie, Li, Yixuan, McDaniel, Patrick, Chen, Muhao, Li, Bo, Xiao, Chaowei

arXiv.org Artificial Intelligence

Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been significantly compromised by fine-tuning users' uploaded examples contain just a few harmful examples. Though potential defenses have been proposed that the service providers can integrate safety examples into the fine-tuning dataset to reduce safety issues, such approaches require incorporating a substantial amount of data, making it inefficient. To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger". By integrating prefixed safety examples into the fine-tuning dataset, the subsequent fine-tuning process effectively acts as the "backdoor attack", establishing a strong correlation between the secret prompt and safety generations. Consequently, safe responses are ensured once service providers prepend this secret prompt ahead of any user input during inference. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models without harming the benign performance. Furthermore, we also present the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data.


The Long Shadow of the 'Nigerian Prince' Scam

WIRED

In November 2021, Oluwaseun Medayedupin was arrested by the Nigerian police in Lagos. An investigation found that he had been pursuing "disgruntled employees" from American companies and pushing them to release ransomware on internal enterprise servers, offering a percentage of the cut if they agreed to collaborate in the attack. This was a sophisticated social engineering scheme, far more advanced than the notorious "Nigerian prince" emails that have made the country of Nigeria synonymous with scams. The origins of these types of scams may be attributed to a boom in the establishment of cybercafes during the 1990s, coinciding with falling oil prices in Nigeria and a rise in unemployment. Add in a lack of national social security, and many Nigerians were forced to seek out alternative forms of employment--physical labor; gig work; and, most notoriously, cybercrime.


Five Ways You're Already Using Machine Learning: A Day with AI - insideBIGDATA

#artificialintelligence

In this special guest feature, Mark Scott, CMO at Apixio, highlights the prevalence of machine learning in everyday life and offers five ways you're (probably) already using machine learning all without you realizing or thinking about it. Mark has more than 19 years of medical technology and health care provider marketing experience. His expertise covers all the bases--from brand development, positioning and messaging; to brand identity, packaging and labeling; public relations; content marketing, website development; internal/employee communications; and global brand-launch activations. Mark has a Bachelors and a Masters Degree from the University of Western Ontario. "Machine learning" can seem like a scary term, bringing to mind images of the techno-dystopias portrayed in the Matrix, Terminator, and Black Mirror.


How Do Machine Learning Programs "Learn"?

#artificialintelligence

In this article, we look at two machine learning (ML) techniques, Naive Bayes classifier and neural networks, and demystify how they work. With all the hype surrounding self-driving cars and video-game-playing AI robots, it's worth taking a step back and reminding ourselves how machine learning programs actually "learn". In this article, we look at two machine learning (ML) techniques–spam filters and neural networks–and demystify how they work. And if you're not sure what machine learning even is, read about the difference between artificial intelligence, machine learning, and deep learning. One common machine learning algorithm is the Naive Bayes classifier, which is used for filtering spam emails.


UPDATED: Machine learning can fix Twitter, Facebook, and maybe even America

#artificialintelligence

Chris Nicholson co-founded Skymind and Deeplearning4j, the most popular deep-learning framework for Java. Quitting Twitter is easy -- I've done it a hundred times. Someone called it "a clown car that drove into a gold mine," and like all clown cars, Twitter makes the passengers get out once in awhile. If I go back, it's because I'm addicted. For an information junkie, that little bubble is hard to resist.


Machine learning can fix Twitter, Facebook, and maybe even America

#artificialintelligence

I've done it a hundred times. Someone called it "a clown car that drove into a gold mine," and like all clown cars, Twitter makes the passengers get out once in awhile. If I go back, it's because I'm addicted. For an information junkie, that little bubble is hard to resist. But Twitter -- and Facebook, for that matter -- is desperately broken in ways that alienate users, spread hate and endanger us as a species.